# End-to-end audio processing
Voila Autonomous Preview
MIT
Voila is a large family of speech-language foundation models designed to enhance human-computer interaction, supporting real-time, low-latency voice interaction and multilingual processing.
Text-to-Audio
Transformers Supports Multiple Languages

V
maitrix-org
332
8
Voila Tokenizer
MIT
Voila is a large-scale voice-language foundation model series designed to enhance human-computer interaction, supporting multiple audio tasks and languages.
Text-to-Audio
Transformers Supports Multiple Languages

V
maitrix-org
4,912
3
Ast Finetuned Speech Commands V2
Bsd-3-clause
An audio spectrogram transformer model fine-tuned on the Speech Commands v2 dataset for audio classification tasks, achieving 98.12% accuracy.
Audio Classification
Transformers

A
MIT
10.94k
15
Featured Recommended AI Models